\chapter[QST with MUBs]{Quantum state tomography with mutually-unbiased bases }
\begin{quote}
One should perform his deeds for the benefit of mankind with an
unbiased approach because bias gives birth to evil, which creates
thousands of obstacles in our path.\\
--The Rig Veda
\end{quote}
\section{Introduction}
Since quantum state tomography was first introduced as an experimental
tool, relatively little research has been focused on improving
it.  In fact, as quantum state tomography techniques have spread to
other physical systems of entangled particles such as trapped ions\cite{Haffner2004},
superconducting qubits\cite{Steffen2006},
and quantum dots\cite{Stevenson2006}, the prescription for doing
quantum state tomography
given by James, Kwiat, Munro and White\cite{James2001} has been
implemented almost exactly as it was originally laid down.  

There is no particular reason why this should be the case.  The projectors
measured in \cite{James2001} were picked for experimental
convenience, but in principle, any set of measurements could have been
used, as long as they form a spanning set for the Hilbert space of
density matrices.  The arbitrariness of the set used by James
et al. is apparent upon inspection of Table
\ref{tab:James_measurements}.  Sixteen projections were taken, the
minimal number required for completeness, and all them were onto
tensor products of eigenstates of the $\sigma_x$,
$\sigma_y$ and $\sigma_z$ Pauli operators in the two qubits.  Because
there are 36 such tensor products, but only 16 projectors in the
set, twenty of the combinations are excluded, and since there is no
natural way to select the set of 16 included and 20 non-included
measurements, the selected set appears unbalanced.  The first four
measurements are projectors onto eigenstates of $\sigma_z\otimes
\sigma_z$, but this is the only basis where all four outcomes are
included.  $\sigma_y\otimes\sigma_z$, $\sigma_x\otimes\sigma_z$,
$\sigma_z\otimes\sigma_x$ and $\sigma_y\otimes\sigma_z$ have two
projectors each in the set and $\sigma_x\otimes\sigma_x$,
$\sigma_x\otimes\sigma_y$, $\sigma_y\otimes\sigma_x$ and
$\sigma_y\otimes\sigma_y$ have one projector each.  This choice of
projectors results, as one might expect, in a much better estimate of
the polarization in the $Z$ direction than in the $X$ and $Y$
directions.  Computer simulations show that with $10,000$
states available for measurement, the James strategy results in a $53\%$
higher standard deviation for $\sigma_y\otimes\sigma_y$ outcomes than
for $\sigma_z\otimes\sigma_z$ outcomes for an input state like
$\ket{DD}$ that gives the same average value for the measurements in
the two bases.

\begin{table}
\begin{center}
\begin{tabular}{|c|c|c|cccc|}
\hline
$\nu$& Mode 1& Mode 2& $h_{1}$ & $ q_{1}$ & $h_{2}$ & $q_{2}$\\ \hline
1&$|{\rm H}\rangle$&$|{\rm H}\rangle$  & $45^{o}$   & $0$      & $45^{o}$   & $0$      \\
2&$|{\rm H}\rangle$&$|{\rm V}\rangle$  & $45^{o}$   & $0$      &   $0$      & $0$      \\
3&$|{\rm V}\rangle$&$|{\rm V}\rangle$  &   $0$      & $0$      &   $0$      & $0$      \\
4&$|{\rm V}\rangle$&$|{\rm H}\rangle$  &   $0$      & $0$      & $45^{o}$   & $0$      \\
5&$|{\rm R}\rangle$&$|{\rm H}\rangle$  & $22.5^{o}$ & $0$      & $45^{o}$   & $0$      \\
6&$|{\rm R}\rangle$&$|{\rm V}\rangle$ & $22.5^{o}$ & $0$      &   $0$      & $0$      \\
7&$|{\rm D}\rangle$&$|{\rm V}\rangle$ &$22.5^{o}$ & $45^{o}$ &   $0$      & $0$      \\
8&$|{\rm D}\rangle$&$|{\rm H}\rangle$  &$22.5^{o}$ & $45^{o}$ & $45^{o}$   & $0$      \\
9&$|{\rm D}\rangle$&$|{\rm R}\rangle$ &$22.5^{o}$ & $45^{o}$ & $22.5^{o}$ & $0$      \\
10&$|{\rm D}\rangle$&$|{\rm D}\rangle$  &$22.5^{o}$ & $45^{o}$ &$22.5^{o}$ & $45^{o}$ \\
11&$|{\rm R}\rangle$&$|{\rm D}\rangle$ & $22.5^{o}$ & $0$      &$22.5^{o}$ & $45^{o}$ \\
12&$|{\rm H}\rangle$&$|{\rm D}\rangle$  & $45^{o}$   & $0$      &$22.5^{o}$ & $45^{o}$ \\
13&$|{\rm V}\rangle$&$|{\rm D}\rangle$ &   $0$      & $0$      &$22.5^{o}$ & $45^{o}$ \\
14&$|{\rm V}\rangle$&$|{\rm L}\rangle$ &   $0$      & $0$      &$22.5^{o}$  & $90^{o}$ \\ 
15&$|{\rm H}\rangle$&$|{\rm L}\rangle$  & $45^{o}$   & $0$      &$22.5^{o}$  & $90^{o}$ \\
16&$|{\rm R}\rangle$&$|{\rm L}\rangle$ & $22.5^{o}$ & $0$      &$22.5^{o}$  & $90^{o}$\\\hline
\end{tabular}
\end{center}
\caption{The tomographic projectors used by James et
  al\cite{James2001}.  Sixteen combinations of Pauli operators were
  chosen out of the 36 possible combinations, resulting in better
  estimation of $\sigma_z\otimes\sigma_z$ outcomes than, for instance,
  $\sigma_y\otimes \sigma_y$.}
\label{tab:James_measurements}
\end{table}

There is no reason that the measurements should be biased in this
way.  From an informational point of view, the James measurement strategy is
inherently wasteful because it measures only some elements of PVMs
it includes, whereas one could collect all the outcomes of all
the PVM elements
for the same number of input photons.  From this perspective,
measuring all 36 tensor products of single-qubit projectors is
no more costly than measuring only sixteen, but offers more
information which should both eliminate the bias in favour of the
$\sigma_z\otimes \sigma_z$ basis and give a better overall estimate of
the state.

There need be no concern about having too many measurements as there
are well-established ways of adapting linear fitting and maximum
likelihood fitting to over-complete sets of measurements.  If we are
interested in understanding what limitations quantum mechanics
imposes on our ability to estimate the density matrix from a finite
number of copies, we need to consider sets of measurements
composed of complete PVMs that do not waste information in obvious ways.  

In recent years a few groups have started doing two-photon polarization quantum
state tomography using these 36 different
projectors\cite{Altepeter2005}, usually by
setting up four detectors, one at each output port of two polarizing
beamsplitters.  This has resulted in the elimination of the bias in
favour of a particular single-qubit basis and better overall
estimation of the quantum state.  

One can still ask, though, whether this is the \emph{optimal} way of
doing quantum state tomography, or whether there exists some other set
of measurements that might in some way be better.  To be precise, we
can pose the question, `Given $N$ copies of a quantum state from a
source, what set
of measurements will, on average, allow the best estimate of the
density matrix of the source?'.  This question is important not just
for technical reasons, but seems to be related to the deeper
epistemological problem how we obtain information about quantum
states, and hence to the ontological problem of what quantum states
\emph{are}\cite{Leonhardt1996}.  In fact, as will be seen in section \ref{section_wigner},
the optimal set of measurements for quantum state tomography is
directly tied to a compelling description of quantum states in terms
of the discrete Wigner function which has several conceptual
advantages over the density matrix as a description of quantum states.   

This chapter will present the first experiment to undertake optimal
quantum state tomography within the framework of PVM-based
measurements\footnote{While for practical reasons PVMs on individual
  copies of a quantum state are the most
  popular class of measurements implemented in
  experiments, they are not the only measurements that could
  potentially be used in state estimation.  Theorists have long been interested in other approaches to
optimal information extraction, and have shown that certain classes of
POVMs\cite{Renes2004} can do better than PVMs, and, perhaps more
surprisingly, that joint measurements performed over multiple copies
of a quantum system can extract more information than measurements on
the individual copies\cite{Massar1995}.  While theoretically
interesting, a real-world implementation of either approach would not
be possible with present technology, except in very specialized
circumstances\cite{Ling2006,Ling2008}.}.  As we will show, the key to selecting measurements is
to try to minimize measurement overlap so as to also minimize
the amount of redundant information being collected.   This
requirement results in the use of \emph{mutually-unbiased bases}, a
special class of projective measurements particularly well-suited to quantum
state estimation.  


\section{Comparing different tomography strategies}
In order to gauge which quantum state estimation strategy is best,
we need a metric to determine how close an estimate of the quantum
state is to the true state of the system.  The `true state' of the
system is itself a philosophically troubling notion, since in
principle it would take an infinitely large dataset to determine what
the true state is.  For our
purposes we may consider it to simply be the asymptotic
estimated state after a sufficiently large number of measurements have
been made.  Typically in our experimental systems, systematic errors,
particularly errors in setting the waveplates accurately,
limit the convergence of the density matrix elements at the
$1\%$ level.  This gives us an experimentally convenient definition of
the true density matrix as the best estimate we can obtain before
systematic effects become dominant.  For numbers of copies of the
state small enough that systematic effects are overwhelmed by
uncertainty due to quantum randomness, we can probe the fundamental
limits of the estimation strategy by comparing a given estimate with
the asymptotic estimate.

For a given number of copies of the state we are interested not so
much in the estimated density matrix as in the variance of the
estimated density matrix over several repetitions of the characterization.
For each value of number of copies $N_{\text{tot}}$ we can develop a histogram of
estimates and distances of the estimates from the asymptotic estimate. 

In order to make this comparison there are several different distance measures
available, most notably the fidelity\cite{Jozsa1994}, the
quantum Chernoff-bound\cite{Audenaert2007} and the Hilbert-Schmidt
distance or trace distance\cite{MikeandIke}.  Of these, a convincing
argument can be made that the Chernoff bound is the best
physically-motivated bound since it represents the limit on
one's ability to distinguish two quantum states from a finite
number of copies\cite{deBurgh2008}.  Suppose an experimentalist wishes to
distinguish whether a quantum state is $\rho$ or $\sigma$, but is only
given $N$ copies of the state to work with.  It can be shown
that her probability of incorrectly identifying the state goes
asymptotically as $P_e \approx e^{N \ln
  \lambda_{cb} \left( \rho,\sigma \right)}$, where $\lambda_{cb}$ is the
quantum Chernoff bound
\begin{equation}
\lambda_{cb}\left(\rho,\sigma\right)=\min_{0\leq s \leq 1}\text{Tr}\left\{\rho^s\sigma^{1-s}\right\}.
\end{equation}

When either $\sigma$ or $\rho$ is pure, the quantum Chernoff
bound is equal to the square root of the fidelity
$F\left(\sigma,\rho\right)$.  For all states, the square root of the fidelity provides an
upper bound for the Chernoff bound $\sqrt{F(\sigma,\rho)}\geq
  \lambda_{cb}(\sigma,\rho)$.  In a
previous study comparing different tomography
techniques\cite{deBurgh2008} a
comparison was made between using the quantum Chernoff bound and
using the fidelity and the two were found to give the same
qualitative results.  In the same work, the Hilbert-Schmidt
distance was found to not differentiate between different
choices of measurement bases that gave very different results in
terms of the Chernoff bound and the fidelity.  For this reason
the Hilbert-Schmidt distance was not considered a good choice
for comparing different tomography configurations, while the
Chernoff bound and fidelity were considered equivalent.  

In the present work we use the fidelity rather than the Chernoff
bound since the fidelity can be represented as an analytic function
of the density matrices rather than as a minimization.  This
makes it both tractable to calculate and allows us to derive
analytic results with it.  It is also a more familiar measure for
experimentalists and is the distance measure most often used in experiments\cite{MikeandIke}. 

\section{Convergence of the estimate towards the true density matrix}
\label{convergence}
Consider an estimate of the density matrix $\rho_{\text{est}}$ taken from
measurements performed on the true state $\sigma$.  The fidelity
of the estimate with the true state is defined as\cite{MikeandIke} 
\begin{equation}
F=\left(\text{Tr}\sqrt{\sqrt{\sigma}\rho_{\text{est}}\sqrt{\sigma}}\right)^2.
\end{equation}
If the estimated density matrix is reasonably close to $\sigma$, then we can
write $\rho_{\text{est}}=\sigma+\delta\rho$ where $\delta\rho$
is a traceless error term whose elements will be small compared
to those of $\sigma$.  Small here means of order $1/\sqrt{N}$ if
$N$ individual measurement outcomes were averaged to create the
expectation values necessary for the state estimation.  We can
quantify the deviation from the true density matrix by the \emph{infidelity},
\begin{equation}
I=1-F.
\end{equation}

In the perturbative limit we can rewrite the fidelity in terms of
$\delta \rho$ as
\begin{equation}
F=\left(\text{Tr}\sqrt{\sqrt{\sigma}\left(\sigma+\delta \rho\right)\sqrt{\sigma}}\right)^2.
\end{equation}
$\sigma$ and $\sqrt{\sigma}$ commute so that
\begin{equation}
F=\left(\text{Tr}\sqrt{\sigma^2+\sqrt{\sigma}\delta \rho \sqrt{\sigma}}\right)^2.
\end{equation}
In general this is a difficult expression to deal with, but it
becomes both simple and illuminating in two special cases.  If
the true state is the pure state $\ket{\psi}$ then
$\sigma=\ket{\psi}\bra{\psi}$.  It follows that $\sqrt{\sigma}=\sigma$ and 
\begin{align}
\sqrt{\sigma}\delta \rho \sqrt{\sigma}&=\sigma \delta \rho \sigma\\
&=\ket{\psi}\bra{\psi} \delta\rho \ket{\psi}\bra{\psi}\\
&=\sigma \text{Tr}\left[\sigma \delta \rho\right]
\end{align}
And since $\sigma=\sigma^2$, 
\begin{align}
F&=\left(\text{Tr}\sqrt{\sigma^2+\sigma^2 \text{Tr}\sigma
  \delta\rho}\right)^2\\
&=\left(\text{Tr}\sigma
  \sqrt{1+\text{Tr}\sigma\delta\rho}\right)^2\\
&=1+\text{Tr}\sigma\delta\rho.
\end{align}
When $\sigma$ is pure, $\text{Tr}\sigma\delta\rho$ will always
be non-positive since to the extent that a measurement in the
basis of $\sigma$ is in error, it must mean a reduction in the
element of $\rho_{\text{est}}$ corresponding to $\ket{\psi}$ along
with an increase in those elements corresponding to states
orthogonal to $\ket{\psi}$.

The important thing to note here is that the reduction in the
fidelity comes in linearly with the density matrix elements.
Since the uncertainty in $\rho$ scales as $1/\sqrt{N}$, we expect the
infidelity to also scale as $1/\sqrt{N}$.  Indeed this is what we
observe.  In figure $\ref{fig:pure_state_estimation}$, showing the
fidelity between the estimated and true density matrices for an
experimentally collected tomography data, the slope on a log-log
plot is ${N_{\text{tot}}}^{-0.42\pm0.06}$, reasonably close to the expected $N^{-1/2}$.
\begin{figure}
\subfigure[Dependence of the infidelity on $N$ for the pure quantum
  state $\ket{HH}$.  The scaling goes roughly as $1/\sqrt{N}$.]{\includegraphics[width=\columnwidth]{Figures/rootN.eps}\label{fig:pure_state_estimation}}
\subfigure[Dependence of the infidelity on $N$ for the maximally-mixed state $\frac{1}{4}\mathbb{I}_4$.  The scaling goes as $1/N$.]{\includegraphics[width=\columnwidth]{Figures/mixedstateN.eps}\label{fig:mixed_state_estimation}}
\label{fig:dependence_on_N}
\end{figure}

The other simple and illustrative case occurs when $\sigma$ is
the maximally-mixed state, $\sigma=\frac{1}{D}\mathbb{I}$ where
$D$ is the dimensionality of the Hilbert space.  When this
is the case, we have
\begin{align}
F&=\left(\text{Tr}\sqrt{\sigma^2+\sqrt{\sigma}\delta\rho\sqrt{\sigma}}\right)^2\\
&=\left(\text{Tr}\sqrt{\frac{1}{D^2}\mathbb{I}+\frac{1}{\sqrt{D}}
  \mathbb{I}\delta\rho\frac{1}{\sqrt{D}}\mathbb{I}}\right)^2\\
&=\left(\frac{1}{D}\text{Tr}\sqrt{\mathbb{I}+D\delta\rho}\right)^2.
\end{align}
We can now Taylor expand the square root using the expansion
$\sqrt{1+x}=1+\frac{1}{2}x-\frac{1}{4}x^2+\ldots$.
\begin{align}
F&=\left(\frac{1}{D}\text{Tr}\mathbb{I}+\frac{D}{2}\text{Tr}\delta\rho-\frac{D}{4}\text{Tr}\delta\rho^2\right)^2\\
&=\left(1-\frac{D}{4}\text{Tr}\delta\rho^2\right)^2\\
&=1-\frac{D}{2}\text{Tr}\delta \rho^2
+\frac{D}{16}\left(\text{Tr}\delta\rho^2\right)^2\\
&\approx 1-\frac{D}{2}\text{Tr}\delta \rho^2,
\label{eq:mixed_state_fidelity}
\end{align}
where we have used the fact that $\delta \rho$ is traceless in
the second line.  We
can recognize $\text{Tr}\delta\rho^2$ as the degree of
polarization of $\rho_{\text{est}}$.  When the state is
completely unpolarized, any polarization results in infidelity
with the true state.  Unlike in the pure state case, the
infidelity now scales quadratically with $\delta \rho$.  This is
also observed in experimental measurements as shown in figure
\ref{fig:dependence_on_N} where the infidelity can be seen to be
proportional to  ${N_{\text{tot}}}^{-1.03 \pm 0.06}$.  

\section{The choice of measurement bases}\label{analytic_infidelity}
These general scaling rules are a universal feature of quantum
state tomography, but the details of how $\delta\rho_{\text{est}}$ depends on
the number of measurements that have been made rely crucially how those measurements relate to one another.  
In order to distinguish different states, projections must be made in
different `directions' in the Hilbert space.  In fact, it makes
intuitive sense that the most powerful measurement strategy will be
the one that makes the direction of different
measurements as different as possible.  This intuition is indeed
correct, but before it can be applied we need to understand what is
meant by `direction' and to develop a mathematical framework for
deciding what `as different as possible' means.

In the Hilbert space of projectors, the natural measure of overlap is the
Hilbert-Schmidt distance defined as
\begin{align}
\text{Tr}\left[\hat{P}_1 \hat{P_2}\right].
\end{align}

\begin{figure}
\includegraphics[width=\columnwidth]{Figures/hs_overlap.eps}
\caption{Plot of the Hilbert-Schmidt overlap of the projectors
  in standard separable state tomography.}
\label{fig:SSQST_overlaps}
\end{figure}
For a set of projectors to be maximally distant from each other,
the RMS value of this overlap should be made as small as
possible.  Figure \ref{fig:SSQST_overlaps} plots this overlap for the 36-measurements formed
by projecting onto eigenstates of the two-qubit Pauli
operators.  We shall call the tomography scheme that makes
measurements in these bases \emph{standard separable quantum state
tomography} (SSQST).  It is clear from the graph that these overlaps are
not all equal.  In particular, there are pairs of bases that
share a common single-qubit Pauli operator, for example,
$\sigma_z\otimes\sigma_z$ and $\sigma_x\otimes\sigma_z$.  For
these pairs of bases, the overlaps are either $0$ or $0.5$
depending on whether a given pair of projectors share the same
eigenstate of $\sigma_z$ for the second qubit. Other pairs of
bases such as $\sigma_z\otimes\sigma_z$ and
$\sigma_x\otimes\sigma_x$ have no eigenstates in common and
therefore all of the overlaps are $0.25$.

Unfortunately there is no way to construct a complete basis
where all the overlaps are equal from tensor products of
single-qubit eigenstates.  One can interpret the inequality of
the overlaps as being due to the failure of the measurement
scheme to reflect the symmetry of the underlying Hilbert space.
Consider, for example, a maximally-entangled state.  Such states can be thought of as
having all their information contained in
correlations and none in the single-photon polarizations.  In
order to characterize such a state, a scheme like SSQST must
make single-qubit measurements and examine the correlation data
to determine the state.  In the process of collecting these
correlations, the single-qubit polarizations must be measured
multiple times.  A more efficient scheme could measure each
single-qubit polarization once, and then use entangling measurements
to determine the correlations that cannot be determined from the
single-qubit measurements.  Once one allows the possibility of
using entangling measurements it becomes possible to create an
optimal measurement scheme where the overlap between any pair of
PVM elements drawn from different bases is equal.  Mathematically, if $\delta$ and $\gamma$
label PVM elements while $\alpha$ and $\beta$ label PVMs, such
bases will have the property
\begin{equation}
\text{Tr}\left[ \hat{P}_{\alpha,\delta}\hat{P}_{\beta,\gamma}
\right]=\delta_{\alpha,\beta}\delta_{\delta\gamma}1/D.
\label{definition_mutual_unbiasedness}
\end{equation}

These bases were first introduced in the context of quantum state estimation
by Wootters and Fields\cite{Wootters1989} who argued that this
condition \ref{definition_mutual_unbiasedness} is precisely the one
that needs to be satisfied for optimal state estimation.  They called
these sorts of PVMs \emph{mutually-unbiased bases} or MUBs.
They were able to show that the maximum number of MUBs in a
Hilbert space of dimension $D$ is $D+1$, and that a set of MUBs
exists whenever $D$ is the power of a prime number.\footnote{When $D$ is not a power of a prime, it is generally believed
that MUBs do not exist, although proving this remains an open
problem.  There is strong numerical evidence supporting this
belief for $D=6$\cite{Vianna2008}.}

Wootters
and Fields calculated the Shannon entropy reduction per
measurement in a general complete QST scheme and argued
that measurements in MUBs would maximize this
quantity.  Unfortunately, their geometric argument is hard to
apply to over-complete sets of bases such as the bases of SSQST.
Here we present an alternate argument based on the density
matrix error analysis discussion of Chapter 2.  

We will consider
the advantage that MUBs have for estimating the maximally-mixed
state from experimental data.  Since this is the average of all
states on the Hilbert space it seems at least plausible that the
scheme that estimates the maximally-mixed state best will, on
average, estimate all states better.  Clearly if one were
interested in estimating a particular range of
states, then one could construct a tomographic scheme tailored
to those states that could do better. 

Recall from Chapter 2 that the variance in the density matrix can be expressed
as 
\begin{align}
\left(\Delta \rho_{ij}\right)^2=\sum_{kq,ab}\frac{\partial \rho_{ij}}{\partial
  P_{kq}}\frac{\partial \rho_{ij}}{\partial
  P_{ab}}\delta P_{kq}\delta P_{ab}.
\label{eq:error_calc}
\end{align}
We can simplify the problem by neglecting the error in the
normalization so that $P_{kq}=n_{kq}/N$ where $N$ is the number
of copies per basis.  For the maximally-mixed state, $P_{kq}=1/D$ in any basis, so
$\left<P_{kq}\right>=1/4$ and $\delta
P_{kq}=\sqrt{\frac{1}{4N}}$.  By neglecting the error in the
normalization we make all the $\delta P_{kq}$ statistically
independent so that 

\begin{equation}
\delta P_{kq}\delta
P_{ab}=\delta_{kq}\delta_{ab} \frac{1}{4N}.  
\label{eq:statistical_independence}
\end{equation}

Now $N$ refers to
the number of copies of the state used \emph{per basis}, but in
comparing tomography schemes we need to consider the total
number of copies available to be measured $N_{\textrm{tot}}$.  We will
assume that the available copies are split equally between the
different bases so that $N=N_{tot}/B$ where $B$ is the number of
bases in a given scheme.  For two qubits $B=D+1=5$ for MUBs tomography and $B=9$
for standard separable tomography.  

Applying this fact and equation
\ref{eq:statistical_independence} to equation
\ref{eq:error_calc}, we obtain
\begin{align}
\left(\Delta
\rho_{ij}\right)^2=\frac{B}{4N}\sum_{kq}\left|\left(\left(\mathbf{M}^\dagger\mathbf{M}\right)^{-1}\mathbf{M^\dagger}\right)_{ij,kq}\right|^2. 
\end{align}
And given the expression for the error in the fidelity for the maximally-mixed
state derived in equation \ref{eq:mixed_state_fidelity}, we can
express the infidelity as
\begin{align}
I=1-F&=\frac{1}{4}\text{Tr}\left[\left(\Delta
    \rho_{ij}\right)^2\right]\\
&=\frac{B}{16N_{\text{tot}}}\sum_{i=1}^4 \sum_{kq}\left|\left(\left(\mathbf{M}^\dagger\mathbf{M}\right)^{-1}\mathbf{M^\dagger}\right)_{ii,kq}\right|^2 .
\end{align}
This result allows us to calculate the ratio of the infidelity
achieved via the two tomography methods for the same value
of $N_{\text{tot}}$.
\begin{equation}
\frac{I_{\text{SSQST}}}{I_{\text{MUBs}}}=\frac{B_{\text{SSQST}}}{B_{\text{MUBs}}}
    \frac{\sum_{i=1}^4 \sum_{kq}\left|\left(\left(\mathbf{M_{\text{SSQST}}}^\dagger\mathbf{M_{\text{SSQST}}}\right)^{-1}\mathbf{M_{\text{SSQST}}^\dagger}\right)_{ii,kq}\right|^2}{\sum_{i=1}^4 \sum_{kq}\left|\left(\left(\mathbf{M_{\text{MUBs}}}^\dagger\mathbf{M_{\text{MUBs}}}\right)^{-1}\mathbf{M_{\text{MUBs}}}\right)_{ii,kq}\right|^2}.
\label{eq:infidelity_ratio}
\end{equation}
We expect that the elements of
$\left(\mathbf{M}^\dagger\mathbf{M}\right)^{-1}\mathbf{M}^\dagger$
will be large whenever two rows of $\mathbf{M}$ are close to
being degenerate.  This fits with our intuition that a good
tomography scheme will distribute its projectors widely in the
Hilbert space.  For reasonably well-balanced tomography
schemes like SSQST and MUBs we expect that over-complete sets
will tend to have smaller values for the sum than merely
complete sets because larger sets of measurements will take a
smaller contribution from each measurement to arrive at a
density matrix element.  The sum of the squares of a large
number of small contributions will generally be larger than the
sum of squares of a smaller number of large contributions.

By this argument we expect the sum to be somewhat smaller for
SSQST than for MUBs, but not enough to make up for the leading factor
$\frac{B_{\text{SSQST}}}{B_{\text{MUBs}}}=9/5$.  When we
calculate the ratio of the sums for the two tomography scheme we
find that it has a value of $0.7691$ which is indeed greater than
$5/9$.  The total advantage
  expected for MUBs is $0.7691\times 9/5=1.38$ in reasonable agreement
  with measurements, as we shall see.

For states other than the maximally-mixed state, the advantage of
MUBs over SSQST will be state-dependent.  As a general rule,
though, we expect MUBs to display the most significant
advantage for entangled states and the least advantage for separable
states since SSQST is biased towards better estimation of
single-qubit polarizations at the expense of correlations.  The
magnitude of the advantage will be determined by a relation very
similar to equation \ref{eq:infidelity_ratio}, involving a sum of
squares of linear map elements, and so as long as the map is
non-singular (which it obviously isn't for either MUBs or SSQST), this
ratio will be on the order of unity.

\section{Constructing MUBs for two qubits}
When $D$ is a power of two,
it can be shown that the eigenstates of mutually commuting sets of Pauli
operators (i.e. tensor products of the Pauli matrices) form
MUBs\cite{Lawrence2002}.  Since mutually-commuting sets of Pauli
operators are easy to construct, this offers a simple and elegant
construction algorithm for MUBs.

Consider the pairwise tensor products of Pauli operators,
including the identity, $\sigma_\mu\otimes\sigma_\nu$ where
$\sigma_\mu=\left(\sigma_x,\sigma_y,\sigma_z,\mathbb{I}\right)$
for $\mu=\left(1,2,3,4\right)$.  If we exclude the term
$\mathbb{I}\otimes\mathbb{I}$, we can divide the remaining
operators into $5$ sets of
$3$ operators sharing a common set of eigenvalues.  One possible
such division is shown in table \ref{tab:mubs_eigenops}

\begin{table}[t]
\begin{center}
\begin{tabular}{|c|c|}
\hline
Pauli operators & Eigenstates/Mutually-unbiased bases\\
\hline\hline
$\sigma_z\otimes\mathbb{I}$, $\mathbb{I}\otimes\sigma_z$,
$\sigma_z\otimes\sigma_z$ & $\ket{HH}$, $\ket{HV}$, $\ket{VH}$,
$\ket{VV}$ \\
\hline
$\sigma_x\otimes\mathbb{I}$, $\mathbb{I}\otimes\sigma_y$,
$\sigma_x\otimes\sigma_y$ & $\ket{DR}$, $\ket{DL}$,$\ket{AR}$,$\ket{AL}$\\
\hline
{$\sigma_y\otimes\mathbb{I}$, $\mathbb{I}\otimes\sigma_x$,
$\sigma_y\otimes\sigma_x$} & $\ket{RD}$, $\ket{RA}$, $\ket{LD}$, $\ket{LA}$\\
\hline
\multirow{2}{*}{$\sigma_y\otimes\sigma_y$, $\sigma_z\otimes\sigma_x$,
$\sigma_x\otimes\sigma_z$} &
$\frac{1}{\sqrt{2}}\left(\ket{RL}+i\ket{LR}\right)$,
$\frac{1}{\sqrt{2}}\left(\ket{RL}-i\ket{LR}\right)$,\\
& $\frac{1}{\sqrt{2}}\left(\ket{RR}+i\ket{LL}\right)$,
$\frac{1}{\sqrt{2}}\left(\ket{RR}-i\ket{LL}\right)$ \\
\hline
\multirow{2}{*}{$\sigma_x\otimes\sigma_x$, $\sigma_y\otimes\sigma_z$,
$\sigma_z\otimes\sigma_y$} &
$\frac{1}{\sqrt{2}}\left(\ket{RV}+i\ket{LH}\right)$,
$\frac{1}{\sqrt{2}}\left(\ket{RV}-i\ket{LH}\right)$,\\
& $\frac{1}{\sqrt{2}}\left(\ket{RH}+i\ket{LV}\right)$,
$\frac{1}{\sqrt{2}}\left(\ket{RH}-i\ket{LV}\right)$ \\
\hline
\end{tabular}
\end{center}
\caption{The construction of mutually-unbiased bases according
  to the method of \cite{Lawrence2002}.  The states making up
  mutually-unbiased bases for two qubits are the eigenstates of
  sets of three mutually commuting Pauli operators.}
\label{tab:mubs_eigenops}
\end{table}

We note that three of the bases are separable and the other
two are maximally-entangled.  This represents the optimal
partitioning of the measurements on the Hilbert space.  The three
bases are used to probe the single-qubit polarizations, while
the remaining two bases directly probe the correlations missed by this
choice of single-qubit measurement basis.  From an experimental
point of view this division of the measurements into separable
and maximally-entangled projections is fortuitous since it means
that MUBs tomography can be implemented relatively easily in a
linear optics system.

\section{Experiment}
To experimentally demonstrate the superiority of MUBs for quantum
state estimation tasks, many individual quantum state estimation experiments
must be repeated and statistically analyzed to see which strategy, on
average, gives the highest fidelity with the true or asymptotic
state. To that end, we developed an automated quantum state tomography
system capable of performing the required measurements for both MUBs
tomography and SSQST.  

The main requirement in such a system is that it be capable of
performing projections onto separable states and also onto maximally-entangled states.  For two qubits, all maximally-entangled states are
related to each other by separable unitary transformations, and all
separable states are similarly related to each other\footnote{This
  follows from the Schmidt decomposition of the two qubit state.
  Using this decomposition it can easily be shown\cite{MikeandIke}
  that the only parameter of the state not entirely determined by
  single-qubit operations is the Schmidt number or, equivalently, the
  degree of entanglement.}.  Thus if a system is capable of performing
projections onto any maximally-entangled state and onto any separable
state and of performing arbitrary unitary transformation on the
individual qubits, then all the necessary measurements required for
MUBs tomography and standard separable tomography can be achieved.

\subsection{Separable measurements}
All of the necessary separable measurements can be achieved by performing
ordinary polarization analysis on the two qubits separately and keeping track
of correlations between measurements\cite{James2001}.  As was seen in
Chapter 2, this involves simply a quarter-waveplate, a half-waveplate and a polarizer for each of the beam
paths, followed by coincidence detection to determine when the
single-photon detectors for the two qubits fire at the same time.

While it is possible to collect all four PVM elements for a separable
measurement at the same time, the unequal collection efficiencies of
the detectors makes it difficult to normalize properly.  In practice it was
found to be easier to use a single pair of detectors and 36
different waveplate settings to record all 36 projectors necessary for
standard separable tomography.  The twelve separable MUBs projectors
were collected using the same setup.  This approach tarnishes some of
the luster of MUBs tomography since the amount of information being
extracted from each photon is not optimal as it would be if the number
of measurements were simply the number of different bases.  From a
practical point of view, though, there is very little difference since
measurement outcomes are normalized not to the number of
photon pairs produced by the source, but to the number detected.  This
normalization process sweeps collection inefficiencies under the rug
while maintaining the ability to measure information-theoretic
inefficiencies in the amount of information extracted about the state
from each \emph{detected} photon pair.  This is all that is needed to
validate the information-theoretic advantages of MUBs.   

The measurement apparatus used to perform the separable tomography is
shown in figure \ref{fig:separable_apparatus}.  The source of photons is
the type-I source described in Chapter 2 and is capable
of producing states with a variable degree of entanglement that can be
controlled by the pump polarization.  Liquid crystal waveplates (LCWPs) can be
used to randomize the polarization and produce mixed polarization
states.  Standard polarization analyzers are followed by single-photon
counting modules that fire upon detection of a single
photon, while coincidence electronics distinguish SPDC pairs from
background light.
\begin{figure}
\subfigure[Apparatus for measuring separable state projections]{\label{fig:separable_apparatus}\includegraphics[width=0.47\columnwidth]{Figures/mubs_apparatus.eps}}
\subfigure[Apparatus for entangled state projections]{\label{fig:entangled_apparatus}\includegraphics[width=0.47\columnwidth]{Figures/sep_apparatus.eps}}
\end{figure}

The measurement was set by controlling the angle of half and quarter-waveplates mounted in Newport PR50 motorized rotation stages
and controlled via GPIB.  

Coincidences were recorded during 3000 0.2-second intervals for each
of the 36 measurements.  Each interval
contained roughly 30 coincidence events per basis.  Because changing
waveplate angles took several seconds, waveplate angles were only
changed after 3000 counting intervals had expired for each of the
measurements.  The entire cycle of 3000 measurements for each of the
36 settings took approximately 36 hours, a duration that was limited
by the slow communication speed of the USB interface between the
coincidence counting module and the computer.  This measurement scheme
was not optimal as long-term power drifts due to day-night temperature
variations and other environmental effects would unequally affect the
different measurement bases.  Although one would not expect such drift to
affect polarization or waveplate retardance, it could affect laser
power, mode-hopping, liquid-crystal phase shift and detector dark
count rates.  Were this
a problem, it would be observable in the detection rates at the
single-photon counting modules.  These rates, however, were stable to
within shot-noise fluctuations, providing reassurance that such drift
was not a statistically significant problem.

\subsection{MUBs measurements}
The maximally-entangled measurements can be obtained by applying two-photon
interference as described in Chapter 1.  Recall that when a
two-photon polarization state is incident on a 50-50 beamsplitter, the
outcome is determined by interference.  If the polarization state is permutation
symmetric, the interference will be constructive for probability
amplitudes for the photons leaving the beamsplitter in the same port,
and the photons will never leave from opposite ports.
If the polarization state is permutation anti-symmetric, the
interference will be constructive for the two photons leaving in
opposite ports and the photons will always do so.

If this interference effect is followed by post-selection on one
photon being horizontally polarized and one being vertically
polarized, then events in which the photons leave in opposite ports
can be caused only by a component of the state along the
anti-symmetric state with one horizontal photon and one vertical
photon, the singlet state
$\frac{1}{\sqrt{2}}\left(\ket{HV}-\ket{VH}\right)$.  Similarly, events
in
which the photons leave from the same port can be uniquely identified
with the symmetric component with one vertical and one horizontal
photon, the triplet state
$\frac{1}{\sqrt{2}}\left(\ket{HV}+\ket{VH}\right)$.  Thus detection of
such events constitute projections onto these states, and measurement
of the frequency of such outcomes as a fraction of all outcomes gives
a measure of the expectation value of the projectors onto these
states.

In a real experiment two major considerations must be taken into
account.  First, the visibility of the two-photon interference is
typically significantly less than 100\%.  In our experiment it was
typically 93\%, limited by a number of factors including the unequal
transmission and reflection coefficients of the beamsplitter and
imperfect spatial overlap at the beamsplitter.  Second, for the
singlet state projection, if the reflection and transmission
coefficients are unequal, and coincidences are measured by projecting
one output port onto $V$ and the other onto $H$, then the detection
probability depends not only on the
$\frac{1}{\sqrt{2}}\left(\ket{HV}-\ket{VH}\right)$ projection, but on
the $\ket{HV}$ and $\ket{VH}$ projections individually, since one of
these will correspond to two transmission events and the other to two
reflection events.  The triplet state projection does not suffer from
this problem since triplet state projections require the $H$ and $V$
photons to leave the \emph{same} port of the beamsplitter and hence
always involve one reflection event and one transmission event.  As it
happened, the beamsplitter used in the experiment had a splitting
ratio of T57-R43, a significant departure from 50-50.  Surprisingly, we
could not find a vendor who would guarantee a non-polarizing
beamsplitter to have a 50-50 splitting ratio to better than $\pm5\%$.
This being the case, it was found more convenient to use the triplet
projection and ignore the singlet state projection.  The limited
visibility was accounted for by reducing the size of the coherence in
the projection operator which was written as 
\begin{equation}
\left( \begin{array}{cccc}
0 & 0 & 0 & 0\\
0 & RT & vRT & 0 \\
0 & vRT & RT & 0 \\
0 & 0 & 0 & 0
\end{array} \right),
\label{eq:projection_op}
\end{equation}
where $v$ is the two-photon interference visibility, and $R$ and $T$
are the reflection and transmission coefficients.  Numerical
simulations showed that the $93\%$ visibility affected the
results by less than the statistical error in the data.  As with
the separable measurements, 3000 0.2-second measurement intervals were
taken for each set of waveplate angles needed to rotate the
operator in \ref{eq:projection_op} to the entangled projectors listed in Table
\ref{tab:mubs_eigenops}.  

Since $R,T<0.5$, operator \ref{eq:projection_op} will
generally have a trace less than $0.5$ and so fails to produce an
outcome at least $50\%$ of the time, even when the input is
$\ket{\psi^+}$.  As the splitting ratio
becomes less equal, the efficiency of the projection becomes even worse.  From
an experimental implementation viewpoint this is not a problem as it only
necessitates counting for a longer interval to obtain the same number
of coincidences.  From a conceptual point of view, though, a tomography
scheme that fails to work half of the time is problematic.  

Recently, some clever schemes for using hyper-entangled
states and multiple photon degrees of freedom has
allowed for a complete Bell-state measurement to be implemented
in linear optics\cite{Schuck2006}.
A MUBs tomography scheme that made use of this Bell-state
measurement technique would not suffer the same failure rate as
the scheme presented here.  It has also been suggested\cite{Klimov2008} that
MUBs tomography would be of great use in trapped ion systems,
where strong inter-particle interactions make deterministic Bell-state
measurements relatively easy operations to perform.  The present
experiment may be considered a proof-of-principle demonstration
of the sort of advantage that one might obtain in systems
capable of making efficient Bell-state measurements.

\subsection{Analysis}
The 3000 measurements taken in each basis were randomly combined
together to form datasets with different total numbers of counts.
To each dataset was then applied the maximum-likelihood fitting method
to determine the density matrix most likely to have produced the
dataset.  Each such density matrix was compared to the density matrix
most likely to have produced the entire dataset.  This was repeated 30
times for each total number of counts and the fidelities averaged over
these 30 repetitions.  The data were then plotted on a log-log plot of
infidelity versus number of copies of the state.  

The errors in the regime of interest are dominated by statistical
errors.  These were calculated by taking the standard deviation of the
infidelity over the set of 30 repetitions for each total number of
copies of the state.

\subsection{Results}
Figure \ref{fig:HV} through \ref{fig:entangled_state} show the results of this analysis.  The
experiment was repeated for three different quantum states, namely the
separable states $\ket{HV}$, the maximally-mixed state
and the state
\begin{equation}
\rho=\left( \begin{array}{cccc}
0.5 & 0 & 0  & 0.43\\
0 & 0 & 0  & 0\\
0 & 0 & 0  & 0\\
0.43 & 0  & 0 & 0.5
\end{array}\right),
\end{equation}

\begin{figure}
\includegraphics[width=\columnwidth]{Figures/HV.eps}
\caption{Measured infidelity for
  $\ket{HV}$}
\label{fig:HV}
\end{figure}

\begin{figure}
\includegraphics[width=\columnwidth]{Figures/maximally_mixed.eps}
\caption{Measured infidelity for the maximally-mixed state}
\label{fig:maximally_mixed}
\end{figure}

\begin{figure}
\includegraphics[width=\columnwidth]{Figures/HH+VV.eps}
\caption{Measured infidelity for the partially mixed entangled
  state close to
  $\frac{1}{\sqrt{2}}\left(\ket{HH}+\ket{VV}\right)$}
\label{fig:entangled_state}
\end{figure}

which was the closest approximation we could make to a maximally-entangled state given the limitations of our system\footnote{See
  Chapter 2 for an explanation of these limitations}.  

The results are shown in figures \ref{fig:HV}
through \ref{fig:entangled_state}.  The MUBs
tomography produced a lower
infidelity than separable tomography for the maximally-mixed state and
the entangled state, while a similar level of
infidelity was observed for the separable state $\ket{HV}$.  The
constant shift on
the log-log plot corresponds to a fixed ratio of infidelity between
the two estimation methods.  For the entangled state this ratio was
$1.84\pm0.06$.  For $\ket{HV}$ it was $1.09\pm0.04$.  For
the maximally-mixed state it was $1.49 \pm 0.05$, in reasonable agreement
analytical results of section \ref{analytic_infidelity}.  

These results are consistent with Monte Carlo simulations and with the
arguments of the preceding sections.  Because standard separable
state tomography produces better estimates of single-qubit
polarizations than correlations, we expect that MUBs tomography will produce a
larger improvement for entangled states than for separable states.  Indeed, this is what we
observe as can be seen by comparing the widely separated lines in figure \ref{fig:entangled_state} to
the essentially overlapping lines of figure
\ref{fig:HV}.  While MUBs tomography
indeed shows a
significant advantages for entangled states, it is perhaps more
notable that it is no worse than SSQST even for the states
for which SSQST performs best.  This is evidence that using MUBs
can result in significant improvements over the full range of states.

\section{The discrete Wigner function for two qubits}\label{section_wigner}
Another appealing feature of MUBs is their close relationship to the
discrete Wigner function\cite{Leonhardt1996,Gibbons2004}.  While
we will not delve too deeply into this rich subject here, we
will briefly show how a discrete Wigner function can be obtained from
the data collected in MUBs tomography.  The
analysis follows the framework laid out in reference \cite{Gibbons2004}. 

\subsection{Finite fields}
The theory of MUBs and discrete Wigner functions is closely related to
the theory of finite fields in number theory.  We can define a
four-element field $\mathbb{F}_4$ consisting of the symbols
$\left\{0,1,\omega,\bar{\omega}\right\}$ along with addition and
multiplication operations defined by tables
  \ref{tab:addition_table} and \ref{tab:multiplication_table}.
\begin{table}[ht]
\begin{minipage}[b]{0.5\linewidth}\centering
\begin{tabular}{c||c|c|c|c}
+ & 0 & 1 & $\omega$ & $\bar{\omega}$ \\
\hline\hline
0 & 0 & 1 & $\omega$ & $\bar{\omega}$ \\
\hline
1 & 1 & 0 & $\bar{\omega}$ & $\omega$ \\
\hline
$\omega$ & $\omega$ & $\bar{\omega}$ & 0 & 1 \\
\hline
$\bar{\omega}$ & $\bar{\omega}$ & $\omega$ & 1 & 0 
\end{tabular}
\caption{Addition table for $\mathbb{F}_4$.  Reproduced from reference \cite{Gibbons2004}.}
\label{tab:addition_table}
\end{minipage}
\hspace{0.5cm}
\begin{minipage}[b]{0.5\linewidth}\centering
\begin{tabular}{c||c|c|c|c}
$\times$ & 0 & 1 & $\omega$ & $\bar{\omega}$ \\
\hline\hline
0 & 0 & 0 & 0 & 0 \\
\hline
1 & 0 & 1 & $\omega$ & $\bar{\omega}$ \\
\hline
$\omega$ & 0 & $\omega$ & $\bar{\omega}$ & 1 \\
\hline
$\bar{\omega}$ & 0 & $\bar{\omega}$ & 1 & $\omega$ 
\end{tabular}
\caption{Multiplication table for $\mathbb{F}_4$.  Reproduced from reference \cite{Gibbons2004}.}
\label{tab:multiplication_table}
\end{minipage}
\end{table}
With the operations of multiplication and addition clearly
defined we can define lines on the discrete phase space
consisting of a grid of points labeled by an (x,y) pair of
elements in $\mathbb{F}_4$.  Lines are equations of the form
$y=m x+b$ where $m$ and $b$ are also elements of
$\mathbb{F}_4$.  Of particular importance will be families of
parallel lines in the phase space, that is to say lines with the
same slope $m$.  Since the phase space has $16$ discrete points,
each such family can contain $4$ parallel lines.  Each family
will have a particular line called a ray that
intersects the point (0,0).  Each of the $4^2-1$ points not at the origin
defines a ray, but each ray contains three points other than the
origin, so the number of distinct rays is
$\left(4^2-1\right)/3=5$.  Consequently, the number of families
of parallel lines is also $5$.  It is left as an exercise for
the reader to check that the lines denoted by the blue-circled
points in figure \ref{fig:phase_space_lines} form lines in this
phase space and that each horizontal box contains a family of
parallel lines.
\begin{figure}
\includegraphics[width=\columnwidth]{Figures/4strimaca.eps}
\caption{The five sets of four parallel lines in the $\mathbb{F}_4
  \times \mathbb{F}_4$ phase space.  Reproduced from reference \cite{Gibbons2004}.}
\label{fig:phase_space_lines}
\end{figure}
\subsection{Discrete Wigner functions}
For each family of parallel lines in
figure \ref{fig:phase_space_lines} we will
associate a MUB from table \ref{tab:mubs_eigenops}, and to each
particular line $\lambda$ we assign a projector $Q(\lambda)$, in the same order as
the lines and projectors appear in the figure
\ref{fig:phase_space_lines} and table \ref{tab:mubs_eigenops}.  Next, for each
point $(x,y)$ in the phase space we associate a \emph{point operator}
defined as the sum of all the projectors associated with lines that
contain that point  
\begin{equation}
A_{(x,y)}=\left[\sum_{\lambda \ni (x,y)}Q(\lambda)\right]-\mathbb{I}_4.
\end{equation}
Then we can define the discrete Wigner function of the state to
be 
\begin{equation}
W(x,y)=\frac{1}{N}\text{Tr}\left(\rho A_{(x,y)}\right),
\end{equation}
so that
\begin{equation}
\rho=\sum_{(x,y)} W(x,y) A_{(x,y)}.
\label{eq:completeness}
\end{equation}

Defined in this way, the Wigner function has many useful
properties.  First, it is complete as equation
\ref{eq:completeness} attests.  Any density matrix $\rho$ has a
corresponding Wigner function.  Second, if the Wigner function
is summed along a series of parallel lines in the phase space,
one obtains a marginal probability distribution, namely the set
of expectation values for measurements done in the corresponding
MUB.  In this way the discrete Wigner function resembles the
ordinary continuous
Wigner function\cite{Wigner1932} on $(x,p)$ phase space
where a projection of
the function along any axis generates a strictly positive
marginal probability distribution.  Like the continuous Wigner
function, the discrete Wigner function is real, but may be
negative at some points.  Despite this, summing the discrete Wigner
function elements along any line in the discrete phase space
results in a positive probability.

\subsection{Reconstruction}
We will reconstruct the discrete Wigner function for an
entangled state produced in our type-I setup. 
From experimental data taken during MUBs tomography we can
construct the sets of frequencies shown in table
\ref{tab:expectation_value_table} for each of the 5 MUBs.
\begin{table}
\begin{tabular}{|c|cccc|}
\hline
Projectors & \multicolumn{4}{c|}{Relative frequency}\\
\hline\hline
$\ket{HH}$,$\ket{HV}$,$\ket{VH}$,$\ket{VV}$ & 0.5254 & 0.0114 & 0.0058 & 0.4575\\
\hline
$\ket{DR}$,$\ket{DL}$,$\ket{AR}$,$\ket{AL}$ & 0.1980 & 0.3087 & 0.2745 & 0.2188\\
\hline
$\ket{RD}$,$\ket{RA}$,$\ket{LD}$,$\ket{LA}$ & 0.3101 & 0.2237 & 0.2445 & 0.2217\\
\hline
$\frac{1}{\sqrt{2}}\left(\ket{RH}+i\ket{LV}\right)$,
$\frac{1}{\sqrt{2}}\left(\ket{RH}+i\ket{LV}\right)$ &
\multirow{2}{*}{0.1004} & \multirow{2}{*}{0.0975} & \multirow{2}{*}{0.3688} &
\multirow{2}{*}{0.4333}\\
$\frac{1}{\sqrt{2}}\left(\ket{RH}+i\ket{LV}\right)$,$\frac{1}{\sqrt{2}}\left(\ket{RH}+i\ket{LV}\right)$ & & & &\\ 
\hline
$\frac{1}{\sqrt{2}}\left(\ket{RH}+i\ket{LV}\right)$,
$\frac{1}{\sqrt{2}}\left(\ket{RH}+i\ket{LV}\right)$ &
\multirow{2}{*}{0.1068} & \multirow{2}{*}{0.0762} & \multirow{2}{*}{0.4510} &
\multirow{2}{*}{0.3660}\\
$\frac{1}{\sqrt{2}}\left(\ket{RH}+i\ket{LV}\right)$,
$\frac{1}{\sqrt{2}}\left(\ket{RH}+i\ket{LV}\right)$ & & & &\\
\hline
\end{tabular}
\caption{Table of measured expectation values from MUBs tomography on
  an entangled state}
\label{tab:expectation_value_table}
\end{table}
Each point in the Wigner function is the sum of the
probabilities for the projectors associated with the lines going
through the point minus 1.  As a result we can directly
calculate the Wigner function from the data in table
\ref{tab:expectation_value_table}.  The resulting Wigner
function is:
\begin{equation}
W(x,y)=
\begin{array}{c|cccc|}
\hline
\bar{\omega} & 0.1913 & -0.0803 & -0.0001 & 0.1079\\
\omega & 0.2160 & -0.0790 & 0.0135 & 0.1239\\
1 & 0.0579 & 0.1286 & 0.0030 & 0.1193\\
0 & 0.0602 & 0.0420 & -0.0106 & 0.1064\\
\hline
& 0 & 1 & \omega & \bar{\omega}.
\end{array}
\end{equation} 
It will be immediately apparent to the reader that summing along
the columns and rows generates the expectation values of the
first two rows of table \ref{tab:expectation_value_table}.  The
other sets of parallel lines are not so evident, but the reader
can verify that summing
along them will generate the outcomes of rows three through five.

To my knowledge this is the first time that anyone has
reconstructed a discrete Wigner function of a two-qubit state
from experimental data.  It represents an intriguing way of
describing the quantum state of a two-qubit system.  While the
density matrix contains direct probability information about one
basis along its diagonal, the Wigner function contains
probability functions for five different bases simultaneously
in the geometry of the associated lines in finite phase space.
Just as the density matrix can be unitarily transformed to put
another basis along the diagonal, the Wigner function can be rotated
unitarily so as to change the five bases whose expectation
values are directly described, so long as the five bases are
mutually-unbiased.  This makes it an appealingly compact
representation of quantum information.  As a bonus, the Wigner function is real,
rather than complex-valued like the density matrix.  

Apart from its practical usefulness, the discrete Wigner
function creates a remarkable link between the theory of quantum
state estimation and abstract number theory.  The very
compactness of the description highlights the limitations on the
strength of quantum correlations, since a set of changes in the
Wigner function to obtain a particular set of outcomes in one
basis will inevitably affect the probability distributions in
other bases.  It cannot be coincidental that this set of
tradeoffs is perfectly and succinctly encapsulated in the geometry of finite
fields.  Understanding this link remains an area of active research.

In short, the discrete
Wigner function represents a novel and illuminating
description of the quantum state which may some day play an
important role in the description of experimental quantum
systems and may point to a deeper underlying geometry of quantum
measurements.
\section{Summary}
We have, for the first time, demonstrated an optimal PVM
quantum state tomography protocol.  Optimality was achieved by making use
of mutually-unbiased bases, a set of PVMs that have minimal RMS
overlap and hence require the least amount of redundant
information to be collected to characterize the quantum state.
In our experimental implementation of the protocol we observed a
significant improvement over previous methods, especially for
entangled states.

MUBs tomography also allows very simple construction of discrete
Wigner functions, an alternative to the density matrix as a way
of describing quantum states.  We obtained the discrete Wigner
function of a state directly from measurements and showed that it
represents an extremely compact and intuitive description that
may help us to understand the geometric structure underpinning quantum
correlations.
